Replication for results in Davidson et al. 2017. "Automated Hate Speech Detection and the Problem of Offensive Language"


In [1]:
import pandas as pd
import numpy as np
import pickle
import sys
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
from nltk.stem.porter import *
import string
import re
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer as VS
from textstat.textstat import *
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import classification_report
from sklearn.svm import LinearSVC
import matplotlib.pyplot as plt
import seaborn
%matplotlib inline

Loading the data


In [3]:
df = pickle.load(open("../data/labeled_data.p",'rb'))

In [4]:
df


Out[4]:
count hate_speech offensive_language neither class tweet
0 3 0 0 3 2 !!! RT @mayasolovely: As a woman you shouldn't...
1 3 0 3 0 1 !!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2 3 0 3 0 1 !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3 3 0 2 1 1 !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4 6 0 6 0 1 !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...
5 3 1 2 0 1 !!!!!!!!!!!!!!!!!!"@T_Madison_x: The shit just...
6 3 0 3 0 1 !!!!!!"@__BrighterDays: I can not just sit up ...
7 3 0 3 0 1 !!!!“@selfiequeenbri: cause I'm tired of...
8 3 0 3 0 1 " & you might not get ya bitch back & ...
9 3 1 2 0 1 " @rhythmixx_ :hobbies include: fighting Maria...
10 3 0 3 0 1 " Keeks is a bitch she curves everyone " lol I...
11 3 0 3 0 1 " Murda Gang bitch its Gang Land "
12 3 0 2 1 1 " So hoes that smoke are losers ? " yea ... go...
13 3 0 3 0 1 " bad bitches is the only thing that i like "
14 3 1 2 0 1 " bitch get up off me "
15 3 0 3 0 1 " bitch nigga miss me with it "
16 3 0 3 0 1 " bitch plz whatever "
17 3 1 2 0 1 " bitch who do you love "
18 3 0 3 0 1 " bitches get cut off everyday B "
19 3 0 3 0 1 " black bottle & a bad bitch "
20 3 0 3 0 1 " broke bitch cant tell me nothing "
21 3 0 3 0 1 " cancel that bitch like Nino "
22 3 0 3 0 1 " cant you see these hoes wont change "
23 3 0 3 0 1 " fuck no that bitch dont even suck dick " &#1...
24 3 0 3 0 1 " got ya bitch tip toeing on my hardwood floor...
25 3 0 2 1 1 " her pussy lips like Heaven doors " 😌
26 3 0 3 0 1 " hoe what its hitting for "
27 3 0 3 0 1 " i met that pussy on Ocean Dr . i gave that p...
28 3 0 3 0 1 " i need a trippy bitch who fuck on Hennessy "
29 3 0 3 0 1 " i spend my money how i want bitch its my bus...
... ... ... ... ... ... ...
25266 3 1 2 0 1 you ain't gotta be a dyke to like hoes
25267 3 0 3 0 1 you are a hoe, hoe, & a hoe.
25268 3 0 3 0 1 you bitches love yall some corny nigga
25269 3 0 3 0 1 you can masturbate anytime bitch lol “@g...
25270 3 0 3 0 1 you can never get a group of hoes together wit...
25271 3 0 3 0 1 you can tell when dick recently been in a puss...
25272 3 0 3 0 1 you can't cuff a hoe lmao
25273 3 0 2 1 1 you drove me redneck crazy
25274 3 0 3 0 1 you fake niggah lolol
25275 3 1 2 0 1 you got niggas, and i got bitches.
25276 3 0 2 1 1 you gotta be a new breed of retarded if you do...
25277 3 0 3 0 1 you gotta understand that these bitches are ch...
25278 3 0 3 0 1 you hoe spice
25279 3 0 3 0 1 you just want some attention hoe
25280 3 0 1 2 2 you know what they say, the early bird gets th...
25281 3 0 3 0 1 you know what your doing when you favorite a t...
25282 3 0 3 0 1 you lil dumb ass bitch, i ain't fuckin wit chu...
25283 3 0 3 0 1 you look like AC Green...bitch don't call here...
25284 3 0 3 0 1 you look like your 12 stop talking about fucki...
25285 3 0 3 0 1 you might as well gone pussy pop on a stage
25286 3 1 2 0 1 you niggers cheat on ya gf's? smh....
25287 3 0 3 0 1 you really care bout dis bitch. my dick all in...
25288 3 0 3 0 1 you worried bout other bitches, you need me for?
25289 3 3 0 0 0 you're all niggers
25290 3 2 1 0 0 you're such a retard i hope you get type 2 dia...
25291 3 0 2 1 1 you's a muthaf***in lie “@LifeAsKing: @2...
25292 3 0 1 2 2 you've gone and broke the wrong heart baby, an...
25294 3 0 3 0 1 young buck wanna eat!!.. dat nigguh like I ain...
25295 6 0 6 0 1 youu got wild bitches tellin you lies
25296 3 0 0 3 2 ~~Ruffled | Ntac Eileen Dahlia - Beautiful col...

24783 rows × 6 columns


In [5]:
df.describe()


Out[5]:
count hate_speech offensive_language neither
count 24783.000000 24783.000000 24783.000000 24783.000000
mean 3.243473 0.280515 2.413711 0.549247
std 0.883060 0.631851 1.399459 1.113299
min 3.000000 0.000000 0.000000 0.000000
25% 3.000000 0.000000 2.000000 0.000000
50% 3.000000 0.000000 3.000000 0.000000
75% 3.000000 0.000000 3.000000 0.000000
max 9.000000 7.000000 9.000000 9.000000

In [6]:
df.columns


Out[6]:
Index([u'count', u'hate_speech', u'offensive_language', u'neither', u'class',
       u'tweet'],
      dtype='object')

Columns key:

count = number of CrowdFlower users who coded each tweet (min is 3, sometimes more users coded a tweet when judgments were determined to be unreliable by CF).

hate_speech = number of CF users who judged the tweet to be hate speech.

offensive_language = number of CF users who judged the tweet to be offensive.

neither = number of CF users who judged the tweet to be neither offensive nor non-offensive.

class = class label for majority of CF users.

0 - hate speech
1 - offensive  language
2 - neither

tweet = raw tweet text


In [7]:
df['class'].hist()


Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x11e10ad50>

This histogram shows the imbalanced nature of the task - most tweets containing "hate" words as defined by Hatebase were only considered to be offensive by the CF coders. More tweets were considered to be neither hate speech nor offensive language than were considered hate speech.


In [8]:
tweets=df.tweet

Feature generation


In [9]:
stopwords=stopwords = nltk.corpus.stopwords.words("english")

other_exclusions = ["#ff", "ff", "rt"]
stopwords.extend(other_exclusions)

stemmer = PorterStemmer()


def preprocess(text_string):
    """
    Accepts a text string and replaces:
    1) urls with URLHERE
    2) lots of whitespace with one instance
    3) mentions with MENTIONHERE

    This allows us to get standardized counts of urls and mentions
    Without caring about specific people mentioned
    """
    space_pattern = '\s+'
    giant_url_regex = ('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|'
        '[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
    mention_regex = '@[\w\-]+'
    parsed_text = re.sub(space_pattern, ' ', text_string)
    parsed_text = re.sub(giant_url_regex, '', parsed_text)
    parsed_text = re.sub(mention_regex, '', parsed_text)
    return parsed_text

def tokenize(tweet):
    """Removes punctuation & excess whitespace, sets to lowercase,
    and stems tweets. Returns a list of stemmed tokens."""
    tweet = " ".join(re.split("[^a-zA-Z]*", tweet.lower())).strip()
    tokens = [stemmer.stem(t) for t in tweet.split()]
    return tokens

def basic_tokenize(tweet):
    """Same as tokenize but without the stemming"""
    tweet = " ".join(re.split("[^a-zA-Z.,!?]*", tweet.lower())).strip()
    return tweet.split()

vectorizer = TfidfVectorizer(
    tokenizer=tokenize,
    preprocessor=preprocess,
    ngram_range=(1, 3),
    stop_words=stopwords,
    use_idf=True,
    smooth_idf=False,
    norm=None,
    decode_error='replace',
    max_features=10000,
    min_df=5,
    max_df=0.75
    )

In [10]:
#Construct tfidf matrix and get relevant scores
tfidf = vectorizer.fit_transform(tweets).toarray()
vocab = {v:i for i, v in enumerate(vectorizer.get_feature_names())}
idf_vals = vectorizer.idf_
idf_dict = {i:idf_vals[i] for i in vocab.values()} #keys are indices; values are IDF scores

In [11]:
#Get POS tags for tweets and save as a string
tweet_tags = []
for t in tweets:
    tokens = basic_tokenize(preprocess(t))
    tags = nltk.pos_tag(tokens)
    tag_list = [x[1] for x in tags]
    tag_str = " ".join(tag_list)
    tweet_tags.append(tag_str)

In [12]:
#We can use the TFIDF vectorizer to get a token matrix for the POS tags
pos_vectorizer = TfidfVectorizer(
    tokenizer=None,
    lowercase=False,
    preprocessor=None,
    ngram_range=(1, 3),
    stop_words=None,
    use_idf=False,
    smooth_idf=False,
    norm=None,
    decode_error='replace',
    max_features=5000,
    min_df=5,
    max_df=0.75,
    )

In [13]:
#Construct POS TF matrix and get vocab dict
pos = pos_vectorizer.fit_transform(pd.Series(tweet_tags)).toarray()
pos_vocab = {v:i for i, v in enumerate(pos_vectorizer.get_feature_names())}

In [14]:
#Now get other features
sentiment_analyzer = VS()

def count_twitter_objs(text_string):
    """
    Accepts a text string and replaces:
    1) urls with URLHERE
    2) lots of whitespace with one instance
    3) mentions with MENTIONHERE
    4) hashtags with HASHTAGHERE

    This allows us to get standardized counts of urls and mentions
    Without caring about specific people mentioned.
    
    Returns counts of urls, mentions, and hashtags.
    """
    space_pattern = '\s+'
    giant_url_regex = ('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|'
        '[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
    mention_regex = '@[\w\-]+'
    hashtag_regex = '#[\w\-]+'
    parsed_text = re.sub(space_pattern, ' ', text_string)
    parsed_text = re.sub(giant_url_regex, 'URLHERE', parsed_text)
    parsed_text = re.sub(mention_regex, 'MENTIONHERE', parsed_text)
    parsed_text = re.sub(hashtag_regex, 'HASHTAGHERE', parsed_text)
    return(parsed_text.count('URLHERE'),parsed_text.count('MENTIONHERE'),parsed_text.count('HASHTAGHERE'))

def other_features(tweet):
    """This function takes a string and returns a list of features.
    These include Sentiment scores, Text and Readability scores,
    as well as Twitter specific features"""
    sentiment = sentiment_analyzer.polarity_scores(tweet)
    
    words = preprocess(tweet) #Get text only
    
    syllables = textstat.syllable_count(words)
    num_chars = sum(len(w) for w in words)
    num_chars_total = len(tweet)
    num_terms = len(tweet.split())
    num_words = len(words.split())
    avg_syl = round(float((syllables+0.001))/float(num_words+0.001),4)
    num_unique_terms = len(set(words.split()))
    
    ###Modified FK grade, where avg words per sentence is just num words/1
    FKRA = round(float(0.39 * float(num_words)/1.0) + float(11.8 * avg_syl) - 15.59,1)
    ##Modified FRE score, where sentence fixed to 1
    FRE = round(206.835 - 1.015*(float(num_words)/1.0) - (84.6*float(avg_syl)),2)
    
    twitter_objs = count_twitter_objs(tweet)
    retweet = 0
    if "rt" in words:
        retweet = 1
    features = [FKRA, FRE,syllables, avg_syl, num_chars, num_chars_total, num_terms, num_words,
                num_unique_terms, sentiment['neg'], sentiment['pos'], sentiment['neu'], sentiment['compound'],
                twitter_objs[2], twitter_objs[1],
                twitter_objs[0], retweet]
    #features = pandas.DataFrame(features)
    return features

def get_feature_array(tweets):
    feats=[]
    for t in tweets:
        feats.append(other_features(t))
    return np.array(feats)

In [15]:
other_features_names = ["FKRA", "FRE","num_syllables", "avg_syl_per_word", "num_chars", "num_chars_total", \
                        "num_terms", "num_words", "num_unique_words", "vader neg","vader pos","vader neu", \
                        "vader compound", "num_hashtags", "num_mentions", "num_urls", "is_retweet"]

In [16]:
feats = get_feature_array(tweets)

In [17]:
#Now join them all up
M = np.concatenate([tfidf,pos,feats],axis=1)

In [18]:
M.shape


Out[18]:
(24783, 11172)

In [19]:
#Finally get a list of variable names
variables = ['']*len(vocab)
for k,v in vocab.iteritems():
    variables[v] = k

pos_variables = ['']*len(pos_vocab)
for k,v in pos_vocab.iteritems():
    pos_variables[v] = k

feature_names = variables+pos_variables+other_features_names

Running the model

The best model was selected using a GridSearch with 5-fold CV.


In [20]:
X = pd.DataFrame(M)
y = df['class'].astype(int)

In [21]:
select = SelectFromModel(LogisticRegression(class_weight='balanced',penalty="l1",C=0.01))
X_ = select.fit_transform(X,y)

In [22]:
model = LinearSVC(class_weight='balanced',C=0.01, penalty='l2', loss='squared_hinge',multi_class='ovr').fit(X_, y)

In [23]:
model = LogisticRegression(class_weight='balanced',penalty='l2',C=0.01).fit(X_,y)

In [24]:
y_preds = model.predict(X_)

Evaluating the results


In [25]:
report = classification_report( y, y_preds )

In [26]:
print(report)


             precision    recall  f1-score   support

          0       0.44      0.60      0.51      1430
          1       0.97      0.91      0.94     19190
          2       0.81      0.95      0.88      4163

avg / total       0.91      0.90      0.90     24783


In [27]:
plt.rc('pdf', fonttype=42)
plt.rcParams['ps.useafm'] = True
plt.rcParams['pdf.use14corefonts'] = True
plt.rcParams['text.usetex'] = True
plt.rcParams['font.serif'] = 'Times'
plt.rcParams['font.family'] = 'serif'
from sklearn.metrics import confusion_matrix

confusion_matrix = confusion_matrix(y,y_preds)
matrix_proportions = np.zeros((3,3))
for i in range(0,3):
    matrix_proportions[i,:] = confusion_matrix[i,:]/float(confusion_matrix[i,:].sum())
names=['Hate','Offensive','Neither']
confusion_df = pd.DataFrame(matrix_proportions, index=names,columns=names)
plt.figure(figsize=(5,5))
seaborn.heatmap(confusion_df,annot=True,annot_kws={"size": 12},cmap='gist_gray_r',cbar=False, square=True,fmt='.2f')
plt.ylabel(r'\textbf{True categories}',fontsize=14)
plt.xlabel(r'\textbf{Predicted categories}',fontsize=14)
plt.tick_params(labelsize=12)

#Uncomment line below if you want to save the output
#plt.savefig('confusion.pdf')



In [28]:
#True distribution
y.hist()


Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x121eb5890>

In [29]:
pd.Series(y_preds).hist()


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x123f48a90>